Add amdgpu target #134740

Flakebi · 2024-12-25T00:32:48Z

Add amdgpu target to rustc and enable the LLVM target.

Fix compiling core with the amdgpu:
The amdgpu backend makes heavy use of different address spaces. This
leads to situations, where a pointer in one addrspace needs to be casted
to a pointer in a different addrspace. bitcast is invalid for this
case, addrspacecast needs to be used.

Fix compilation failures that created bitcasts for such cases by
creating pointer casts (which creates an addrspacecast under the hood)
instead.

MCP: rust-lang/compiler-team#823
Tracking issue: #135024
Kinda related to the original amdgpu tracking issue #51575 (though that one has been closed for a while).

rustbot · 2024-12-25T00:32:56Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @GuillaumeGomez (or someone else) some time within the next two weeks.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

@rustbot author: the review is finished, PR author should check the comments and take action accordingly
@rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

rustbot · 2024-12-25T00:32:58Z

These commits modify compiler targets.
(See the Target Tier Policy.)

This PR changes how LLVM is built. Consider updating src/bootstrap/download-ci-llvm-stamp.

Some changes occurred in src/doc/rustc/src/platform-support

cc @Noratrieb

This PR modifies config.example.toml.

If appropriate, please update CONFIG_CHANGE_HISTORY in src/bootstrap/src/utils/change_tracker.rs.

jieyouxu · 2024-12-25T06:48:58Z

r? jieyouxu

workingjubilee · 2024-12-25T07:24:46Z

cc @eddyb Hello, tagging you for domain expertise if you want to chime in.

jieyouxu · 2024-12-25T07:28:02Z

Thanks for the PR, @Flakebi. I'm going to request that you open a MCP at https://github.com/rust-lang/compiler-team/issues/ to gauge team consensus for adding this target, primarily to give compiler team members some opportunity to ask clarifying questions and register possible concerns, since:

Adding this target requires modifying codegen_llvm in a non-trivial way (emitting at times
addrspacecast instead of bitcast). In particular, as you stated, this target has a
non-conventional addrspace usage model that I believe we don't quite observe in other existing
targets:

The amdgpu backend makes heavy use of different address spaces. This leads to situations,
where a pointer in one addrspace needs to be casted to a pointer in a different addrspace.
bitcast is invalid for this case, addrspacecast needs to be used.
This requires modifying the LLVM build to also include the AMDGPU backend.
This target seems to be intended for many different CPUs of varying hardware generation, but the
present target definition defaults to gfx900.

Note that usually adding more "conventional" Tier 3 targets do not need to go through the MCP process, but this target looks not so conventional.

jieyouxu · 2024-12-25T07:29:37Z

@rustbot author

compiler/rustc_target/src/spec/targets/amdgcn_amd_amdhsa.rs

Flakebi · 2024-12-26T01:15:07Z

Thank you for the quick review!

I opened an MCP here: rust-lang/compiler-team#823

traviscross · 2024-12-26T01:36:33Z

cc @ZuseZ4

bors · 2024-12-27T16:39:23Z

☔ The latest upstream changes (presumably #134822) made this pull request unmergeable. Please resolve the merge conflicts.

Add amdgpu target Add amdgpu target to rustc and enable the LLVM target. Fix compiling `core` with the amdgpu: The amdgpu backend makes heavy use of different address spaces. This leads to situations, where a pointer in one addrspace needs to be casted to a pointer in a different addrspace. `bitcast` is invalid for this case, `addrspacecast` needs to be used. Fix compilation failures that created bitcasts for such cases by creating pointer casts (which creates an `addrspacecast` under the hood) instead. MCP: rust-lang/compiler-team#823 Tracking issue: rust-lang#135024 Kinda related to the original amdgpu tracking issue rust-lang#51575 (though that one has been closed for a while). try-job: dist-loongarch64-linux try-job: dist-loongarch64-muls try-job: dist-powerpc64-linux

rust-log-analyzer · 2025-02-09T09:46:19Z

A job failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

dist-loongarch64-linux
dist-loongarch64-muls
dist-powerpc64-linux
##[endgroup]
INFO:root:Job type: TryRunType(custom_jobs=['dist-loongarch64-linux', 'dist-loongarch64-muls', 'dist-powerpc64-linux'])
  File "/home/runner/work/rust/rust/src/ci/github-actions/ci.py", line 314, in <module>
    calculate_job_matrix(data)
  File "/home/runner/work/rust/rust/src/ci/github-actions/ci.py", line 266, in calculate_job_matrix
  File "/home/runner/work/rust/rust/src/ci/github-actions/ci.py", line 266, in calculate_job_matrix
    jobs = calculate_jobs(run_type, job_data)
  File "/home/runner/work/rust/rust/src/ci/github-actions/ci.py", line 153, in calculate_jobs
    raise Exception(
    raise Exception(
Exception: Custom job(s) `['dist-loongarch64-muls']` not found in auto jobs
##[error]Process completed with exit code 1.

matthiaskrgr · 2025-02-09T09:49:32Z

ah its called musl and not muls :)
@bors try

bors · 2025-02-09T09:50:43Z

⌛ Trying commit 56795fb with merge ad8d586...

Add amdgpu target Add amdgpu target to rustc and enable the LLVM target. Fix compiling `core` with the amdgpu: The amdgpu backend makes heavy use of different address spaces. This leads to situations, where a pointer in one addrspace needs to be casted to a pointer in a different addrspace. `bitcast` is invalid for this case, `addrspacecast` needs to be used. Fix compilation failures that created bitcasts for such cases by creating pointer casts (which creates an `addrspacecast` under the hood) instead. MCP: rust-lang/compiler-team#823 Tracking issue: rust-lang#135024 Kinda related to the original amdgpu tracking issue rust-lang#51575 (though that one has been closed for a while). try-job: dist-loongarch64-linux try-job: dist-loongarch64-musl try-job: dist-powerpc64-linux

bors · 2025-02-09T11:41:26Z

☀️ Try build successful - checks-actions
Build commit: ad8d586 (ad8d58687ff5e1b7935c4b25be4e251d15443948)

saethlin · 2025-02-09T19:38:52Z

@bors r=workingjubilee

bors · 2025-02-09T19:38:54Z

💡 This pull request was already approved, no need to approve it again.

There's another pull request that is currently being tested, blocking this pull request: Always set the deployment target when building std #133092

bors · 2025-02-09T19:38:55Z

📌 Commit 56795fb has been approved by workingjubilee

It is now in the queue for this repository.

bors · 2025-02-10T05:18:39Z

⌛ Testing commit 56795fb with merge c03c38d...

bors · 2025-02-10T08:13:25Z

☀️ Test successful - checks-actions
Approved by: workingjubilee
Pushing c03c38d to master...

rust-timer · 2025-02-10T09:30:25Z

Finished benchmarking commit (c03c38d): comparison URL.

Overall result: ❌ regressions - please read the text below

Our benchmarks found a performance regression caused by this PR.
This might be an actual regression, but it can also be just noise.

Next Steps:

If the regression was expected or you think it can be justified,
please write a comment with sufficient written justification, and add
@rustbot label: +perf-regression-triaged to it, to mark the regression as triaged.
If you think that you know of a way to resolve the regression, try to create
a new PR with a fix for the regression.
If you do not understand the regression or you think that it is just noise,
you can ask the @rust-lang/wg-compiler-performance working group for help (members of this group
were already notified of this PR).

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

	mean	range	count
Regressions ❌ (primary)	0.7%	[0.2%, 3.8%]	55
Regressions ❌ (secondary)	0.6%	[0.2%, 1.0%]	47
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.7%	[0.2%, 3.8%]	55

Max RSS (memory usage)

Results (primary 2.1%, secondary 3.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.2%	[0.4%, 5.8%]	93
Regressions ❌ (secondary)	3.4%	[0.5%, 6.1%]	96
Improvements ✅ (primary)	-6.8%	[-6.8%, -6.8%]	1
Improvements ✅ (secondary)	-1.5%	[-1.5%, -1.5%]	1
All ❌✅ (primary)	2.1%	[-6.8%, 5.8%]	94

Cycles

Results (primary 2.4%, secondary -2.6%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.4%	[0.6%, 3.6%]	3
Regressions ❌ (secondary)	2.4%	[1.3%, 3.2%]	4
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-5.4%	[-9.7%, -4.0%]	7
All ❌✅ (primary)	2.4%	[0.6%, 3.6%]	3

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 777.876s -> 781.62s (0.48%)
Artifact size: 329.17 MiB -> 348.30 MiB (5.81%)

Kobzol · 2025-02-10T09:52:02Z

The syn regression is noise, the rest is possibly caused by the fact that the shipped LLVM is now 19 MiB larger due to the inclusion of the AMDGPU target.

Mark-Simulacrum · 2025-02-10T13:36:37Z

I'm not sure that size is really a driver for more instruction counts, but it does look like the mere possibility of cross-compiling to AMDGPU is enabling more passes/logic(?) even if presumably those don't do anything on x86. Maybe an optimization opportunity for LLVM/clang? It might be unavoidable with the architecture LLVM has today though.

Sampling a few cachegrind diffs:

helloworld:

--------------------------------------------------------------------------------
-- File:function summary
--------------------------------------------------------------------------------
  Ir______  file:function

<  575,695  ???:
   844,600    llvm::PassRegistry::enumerateWith(llvm::PassRegistrationListener*)
  -782,786    llvm::PassRegistry::enumerateWith(llvm::PassRegistrationListener*) [clone .warm]
   -69,691    llvm::FPPassManager::runOnFunction(llvm::Function&)
    60,129    llvm::MVT::getScalableVectorVT(llvm::MVT, unsigned int)
    54,986    ecache_evict
   -49,525    llvm::SelectionDAGISel::CodeGenAndEmitDAG()
   -41,251    std::back_insert_iterator<llvm::SmallVector<llvm::BasicBlock*, 8u> > std::copy<llvm::po_iterator<llvm::BasicBlock*, llvm::SmallPtrSet<llvm::BasicBlock*, 8u>, false, ll>
    40,624    llvm::StringMapImpl::RehashTable(unsigned int)
    39,256    llvm::StringMapImpl::LookupBucketFor(llvm::StringRef, unsigned int)
    38,978    edata_cache_get
    38,420    llvm::InstCombinerImpl::visitCallBase(llvm::CallBase&)
    37,509    llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>::invalidate(llvm::LazyCallGraph::SCC&, llvm::PreservedAnalyses const&)
    36,129    llvm::SelectionDAG::Legalize()
   -33,750    llvm::PassRegistry::registerPass(llvm::PassInfo const&, bool)
   -33,501    llvm::InstCombinerImpl::visitCallInst(llvm::CallInst&)
   -31,525    tcache_bin_flush_small
    31,343    llvm::PMTopLevelManager::findAnalysisPassInfo(void const*) const
    30,580    eset_remove

clap:

--------------------------------------------------------------------------------
-- File:function summary
--------------------------------------------------------------------------------
  Ir__________  file:function

<  209,305,246  ???:
  -230,519,105    llvm::computeKnownBitsFromContext(llvm::Value const*, llvm::KnownBits&, unsigned int, llvm::SimplifyQuery const&)
  -132,162,101    computeKnownBitsFromOperator(llvm::Operator const*, llvm::APInt const&, llvm::KnownBits&, unsigned int, llvm::SimplifyQuery const&) [clone
   124,686,235    computeKnownBitsFromOperator(llvm::Operator const*, llvm::APInt const&, llvm::KnownBits&, unsigned int, llvm::SimplifyQuery const&)
   114,060,111    llvm::LiveIntervalCalc::calculate(llvm::LiveInterval&, bool)
  -113,662,179    llvm::LiveIntervals::computeVirtRegs()
   104,218,878    llvm::PointerMayBeCaptured(llvm::Value const*, llvm::CaptureTracker*, unsigned int)
   -97,035,202    llvm::SelectionDAGISel::CodeGenAndEmitDAG()
   -90,098,002    llvm::AAResults::getModRefInfo(llvm::Instruction const*, std::optional<llvm::MemoryLocation> const&, llvm::AAQueryInfo&)
    89,433,445    llvm::RAGreedy::calculateRegionSplitCostAroundReg(unsigned short, llvm::AllocationOrder&, llvm::BlockFrequency&, unsigned int&, unsigned int&)
   -83,724,622    llvm::RAGreedy::calculateRegionSplitCost(llvm::LiveInterval const&, llvm::AllocationOrder&, llvm::BlockFrequency&, unsigned int&, bool)
    82,820,615    computeKnownBits(llvm::Value const*, llvm::APInt const&, llvm::KnownBits&, unsigned int, llvm::SimplifyQuery const&) [clone
   -72,563,818    llvm::ScheduleDAGSDNodes::BuildSchedGraph(llvm::AAResults*)
    71,100,555    llvm::SelectionDAG::Legalize()
   -70,774,808    std::back_insert_iterator<llvm::SmallVector<llvm::BasicBlock*, 8u> > std::copy<llvm::po_iterator<llvm::BasicBlock*, llvm::SmallPtrSet<llvm::BasicBlock*, 8u>, false, llvm::GraphTraits<llvm::BasicBlock*> >, std::back_insert_iterator<llvm::SmallVector<llvm::BasicBlock*, 8u> > >(llvm::po_iterator<llvm::BasicBlock*, llvm::SmallPtrSet<llvm::BasicBlo>
    68,499,576    llvm::ScheduleDAGSDNodes::AddSchedEdges()
   -67,376,856    (anonymous namespace)::TailRecursionEliminator::eliminate(llvm::Function&, llvm::TargetTransformInfo const*, llvm::AAResults*, llvm::OptimizationRemarkEmitter*, llvm::DomTreeUpdater&) [clone
    66,298,390    computePointerICmp(llvm::CmpInst::Predicate, llvm::Value*, llvm::Value*, llvm::SimplifyQuery const&)
    62,749,680    std::back_insert_iterator<llvm::SmallVector<llvm::BasicBlock*, 8u> > std::__copy_move_a2<false, llvm::po_iterator<llvm::BasicBlock*, llvm::SmallPtrSet<llvm::BasicBlock*, 8u>, false, llvm::GraphTraits<llvm::BasicBlock*> >, std::back_insert_iterator<llvm::SmallVector<llvm::BasicBlock*, 8u> > >(llvm::po_iterator<llvm::BasicBlock*, llvm::SmallPtrS>
   -62,208,865    simplifyICmpInst(unsigned int, llvm::Value*, llvm::Value*, llvm::SimplifyQuery const&, unsigned int) [clone
    59,398,553    llvm::SCCPInstVisitor::markUsersAsChanged(llvm::Value*)
   -58,896,104    llvm::RegAllocBase::allocatePhysRegs()
    57,899,944    llvm::RAGreedy::selectOrSplitImpl(llvm::LiveInterval const&, llvm::SmallVectorImpl<llvm::Register>&, llvm::SmallSet<llvm::Register, 16u, std::less<llvm::Register> >&, llvm::SmallVector<std::pair<llvm::LiveInterval const*, llvm::MCRegister>, 8u>&, unsigned int)

lqd · 2025-02-10T13:54:47Z

what the eff. we should probably look into this more, it’s super weird

Kobzol · 2025-02-11T07:03:56Z

I remember that sometimes binary/dynamic library size increases also increased icounts due to the dynamic linker doing more work. But based on CacheGrind, it looks like LLVM is actually doing more work, seems like it maybe iterates over more passes that were enabled by the amdgpu target?

lqd · 2025-02-11T07:30:45Z

The max-rss increases also look unexpected, and numerous enough to not be measurement noise. (Could memory allocation be in these ??? cg reports, sometimes it does this for me rather than finding jemalloc/malloc, probably some tests with local builds could be interesting with better debuginfo. That would also help with checking the cycles and wall time results, which seemingly aren’t super stable in these results.)

Does this need more time to bake maybe? @Mark-Simulacrum you’ve marked this as triaged because it’s less actionable on our side than in llvm, right?

[experiment] dont init anything except x86 What if do not init all llvm targets always? Maybe fix regression in rust-lang#134740 r? `@ghost` `@rustbot` label +S-experimental btw, here https://github.com/rust-lang/rust/blob/c182ce9cbc8c29ebc1b4559d027df545e6cdd287/compiler/rustc_llvm/llvm-wrapper/PassWrapper.cpp#L81-L186 similar list for targets, but it missing amdgpu. Is amdgpu works without it? kick perf run please

workingjubilee · 2025-02-11T11:06:38Z

hate to go "ooh, LLVM troubles, let's tell Nikita!" but uhhhh "weird LLVM perf" really does need the Vibe Sense of that kind of expertise, sooo cc @nikic

nikic · 2025-02-11T11:18:27Z

Adding the amdgpu target shouldn't make any additional passes run -- additional cost from registering additional passes etc is plausible though.

max-rss increasing with increasing code size is pretty common.

rustbot assigned GuillaumeGomez Dec 25, 2024

Flakebi mentioned this pull request Dec 25, 2024

Disable f128 for amdgpu rust-lang/compiler-builtins#737

Merged

rustbot assigned jieyouxu and unassigned GuillaumeGomez Dec 25, 2024

jieyouxu added needs-mcp This change is large enough that it needs a major change proposal before starting work. A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. labels Dec 25, 2024

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 25, 2024

bjorn3 reviewed Dec 25, 2024

View reviewed changes

compiler/rustc_target/src/spec/targets/amdgcn_amd_amdhsa.rs Show resolved Hide resolved

bjorn3 reviewed Dec 25, 2024

View reviewed changes

compiler/rustc_target/src/spec/targets/amdgcn_amd_amdhsa.rs Show resolved Hide resolved

Flakebi mentioned this pull request Dec 26, 2024

Add amdgpu target rust-lang/compiler-team#823

Closed

3 tasks

bors added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Dec 27, 2024

jieyouxu removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Dec 31, 2024

Flakebi mentioned this pull request Jan 2, 2025

Tracking Issue for amdgpu target #135024

Open

16 tasks

rustbot added the has-merge-commits PR has merge commits, merge with caution. label Jan 2, 2025

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 9, 2025

bors added the merged-by-bors This PR was explicitly merged by bors. label Feb 10, 2025

bors merged commit c03c38d into rust-lang:master Feb 10, 2025
7 checks passed

rustbot added this to the 1.86.0 milestone Feb 10, 2025

bors mentioned this pull request Feb 10, 2025

Add cygwin target. #134999

Open

rustbot added the perf-regression Performance regression. label Feb 10, 2025

Flakebi deleted the amdgpu-target branch February 10, 2025 10:04

Mark-Simulacrum added the perf-regression-triaged The performance regression has been triaged. label Feb 10, 2025

klensy mentioned this pull request Feb 11, 2025

[experiment] dont init anything except x86 #136861

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add amdgpu target #134740

Add amdgpu target #134740

Flakebi commented Dec 25, 2024 •

edited by saethlin

Loading

rustbot commented Dec 25, 2024

rustbot commented Dec 25, 2024

jieyouxu commented Dec 25, 2024

workingjubilee commented Dec 25, 2024

jieyouxu commented Dec 25, 2024 •

edited

Loading

jieyouxu commented Dec 25, 2024

Flakebi commented Dec 26, 2024

traviscross commented Dec 26, 2024

bors commented Dec 27, 2024

rust-log-analyzer commented Feb 9, 2025

matthiaskrgr commented Feb 9, 2025

bors commented Feb 9, 2025

bors commented Feb 9, 2025

saethlin commented Feb 9, 2025

bors commented Feb 9, 2025

bors commented Feb 9, 2025

bors commented Feb 10, 2025

bors commented Feb 10, 2025

rust-timer commented Feb 10, 2025

Kobzol commented Feb 10, 2025

Mark-Simulacrum commented Feb 10, 2025

lqd commented Feb 10, 2025

Kobzol commented Feb 11, 2025

lqd commented Feb 11, 2025 •

edited

Loading

workingjubilee commented Feb 11, 2025

nikic commented Feb 11, 2025

Add amdgpu target #134740

Add amdgpu target #134740

Conversation

Flakebi commented Dec 25, 2024 • edited by saethlin Loading

rustbot commented Dec 25, 2024

rustbot commented Dec 25, 2024

jieyouxu commented Dec 25, 2024

workingjubilee commented Dec 25, 2024

jieyouxu commented Dec 25, 2024 • edited Loading

jieyouxu commented Dec 25, 2024

Flakebi commented Dec 26, 2024

traviscross commented Dec 26, 2024

bors commented Dec 27, 2024

rust-log-analyzer commented Feb 9, 2025

matthiaskrgr commented Feb 9, 2025

bors commented Feb 9, 2025

bors commented Feb 9, 2025

saethlin commented Feb 9, 2025

bors commented Feb 9, 2025

bors commented Feb 9, 2025

bors commented Feb 10, 2025

bors commented Feb 10, 2025

rust-timer commented Feb 10, 2025

Overall result: ❌ regressions - please read the text below

Instruction count

Max RSS (memory usage)

Cycles

Binary size

Kobzol commented Feb 10, 2025

Mark-Simulacrum commented Feb 10, 2025

lqd commented Feb 10, 2025

Kobzol commented Feb 11, 2025

lqd commented Feb 11, 2025 • edited Loading

workingjubilee commented Feb 11, 2025

nikic commented Feb 11, 2025

Flakebi commented Dec 25, 2024 •

edited by saethlin

Loading

jieyouxu commented Dec 25, 2024 •

edited

Loading

lqd commented Feb 11, 2025 •

edited

Loading